Multiword noun compound bracketing using Wikipedia

نویسندگان

Caroline Barrière

Pierre André Ménard

چکیده

This research suggests two contributions in relation to the multiword noun compound bracketing problem: first, demonstrate the usefulness of Wikipedia for the task, and second, present a novel bracketing method relying on a word association model. The intent of the association model is to represent combined evidence about the possibly lexical, relational or coordinate nature of links between all pairs of words within a compound. As for Wikipedia, it is promoted for its encyclopedic nature, meaning it describes terms and named entities, as well as for its size, large enough for corpus-based statistical analysis. Both types of information will be used in measuring evidence about lexical units, noun relations and noun coordinates in order to feed the association model in the bracketing algorithm. Using a gold standard of around 4800 multiword noun compounds, we show performances of 73% in a strict match evaluation, comparing favourably to results reported in the literature using unsupervised approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A System for Compound Noun Multiword Expression Extraction for Hindi

Compound noun multiword expressions are important for many NLP applications like machine translation and information retrieval. This paper describes a system for Hindi compound noun multiword expressions (MWE) extraction from a given corpus. We identify major categories of compound noun MWEs, based on linguistic and psycholinguistic principles. Our extraction methods use various statistical co-...

متن کامل

A Dataset for Joint Noun-Noun Compound Bracketing and Interpretation

We present a new, sizeable dataset of noun– noun compounds with their syntactic analysis (bracketing) and semantic relations. Derived from several established linguistic resources, such as the Penn Treebank, our dataset enables experimenting with new approaches towards a holistic analysis of noun–noun compounds, such as jointlearning of noun–noun compounds bracketing and interpretation, as well...

متن کامل

Scaling Up BioNLP: Application of a Text Annotation Architecture to Noun Compound Bracketing

We describe the use of the Layered Query Language and architecture to acquire statistics for natural language processing applications. We illustrate system’s use on the problem of noun compound bracketing using MEDLINE.

متن کامل

Search Engine Statistics Beyond the n-Gram: Application to Noun Compound Bracketing

In order to achieve the long-range goal of semantic interpretation of noun compounds, it is often necessary to £rst determine their syntactic structure. This paper describes an unsupervised method for noun compound bracketing which extracts statistics from Web search engines using a χ measure, a new set of surface features, and paraphrases. On a gold standard, the system achieves results of 89....

متن کامل

Linked Open Data and Web Corpus Data for noun compound bracketing

This research provides a comparison of a linked open data resource (DBpedia) and web corpus data resources (Google Web Ngrams and Google Books Ngrams) for noun compound bracketing. Large corpus statistical analysis has often been used for noun compound bracketing, and our goal is to introduce a linked open data (LOD) resource for such task. We show its particularities and its performance on the...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Multiword noun compound bracketing using Wikipedia

نویسندگان

چکیده

منابع مشابه

A System for Compound Noun Multiword Expression Extraction for Hindi

A Dataset for Joint Noun-Noun Compound Bracketing and Interpretation

Scaling Up BioNLP: Application of a Text Annotation Architecture to Noun Compound Bracketing

Search Engine Statistics Beyond the n-Gram: Application to Noun Compound Bracketing

Linked Open Data and Web Corpus Data for noun compound bracketing

عنوان ژورنال:

اشتراک گذاری